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Abstract. Submodular function minimization is a key problem in a wide variety of applications in 
machine learning, economics, game theory, computer vision and many others. The general solver has a 
complexity of 0(n 6 + n 5 L) where L is the time required to evaluate the function and n is the number 
of variables [22]. On the other hand, many useful applications in computer vision and machine learning 
applications are defined over a special subclasses of submodular functions in which that can be written 
as the sum of many submodular cost functions defined over cliques containing few variables. In such 
functions, the pseudo-Boolean (or polynomial) representation [2] of these subclasses are of degree (or 
order, or clique size) k where k « n. In this work, we develop efficient algorithms for the minimization 
of this useful subclass of submodular functions. To do this, we define novel mapping that transform 
submodular functions of order k into quadratic ones, which can be efficiently minimized in 0(n 3 ) 
time using a max-flow algorithm. The underlying idea is to use auxiliary variables to model the higher 
order terms and the transformation is found using a carefully constructed linear program. In particular, 
we model the auxiliary variables as monotonic Boolean functions, allowing us to obtain a compact 
transformation using as few auxiliary variables as possible. Specifically, we show that our approach 
for fourth order function requires only 2 auxiliary variables in contrast to 30 or more variables used in 
existing approaches. In the general case, we give an upper bound for the number or auxiliary variables 
required to transform a function of order k using Dedekind number, which is substantially lower than 
the existing bound of 2 2 . 

Keywords: submodular functions, quadratic pseudo-Boolean functions, monotonic Boolean functions, 
Dedekind number, max-flow/mincut algorithm 

1 Introduction 

Many optimization problems in several domains such as operations research, computer vision, machine 
learning, and computational biology involve submodular function minimization. Submodular functions (See 
Definition 1) are discrete analogues of convex functions [20]. Examples of such functions include cut capac- 
ity functions, matroid rank functions and entropy functions. Submodular function minimization techniques 
may be broadly classified into two categories: efficient algorithms for general submodular functions and 
more efficient algorithms for subclasses of submodular functions. This paper falls under the second cate- 
gory. 

General solvers: The role of submodular functions in optimization was first discovered by Edmonds when 
he gave several important results on the related poly-matroids [4]. Grotschel, Lovasz and Schrijver first 
gave a polynomial -time algorithm for minimization of submodular function using ellipsoid method [7]. 
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Recently several combinatoric and strongly polynomial algorithms [5, 1 1, 12, 27] have been developed based 
on the work of Cunningham [3]. The current best strongly polynomial algorithm for minimizing general 
submodular functions [22] has a run-time complexity of 0(n 5 L + n 6 ), where L is the time taken to evaluate 
the function and n is the number of variables. Weakly polynomial time algorithms with a smaller dependence 
on n also exist. For example, to minimize the submodular function f(x) the scaling algorithm of Iwata [13] 
has a run-time complexity of 0(n 4 L + n 5 ) log M. As before, L refers to the time required to compute the 
function / and M refers to the maximum absolute value of the function /. 

Specialized solvers: There has been much recent interest in the use of higher order submodular functions 
for better modeling of computer vision and machine learning problems [15, 19, 10]. Such problems typical 
involve millions of pixels making the use of general solvers highly infeasible. Further, each pixel may take 
multiple discrete values and the conversion of such a problem to a Boolean one introduces further variables. 
On the other hand, the cost functions for many such optimization algorithms belong to a small subclass 
of submodular functions. The goal of this paper is to provide an efficient approach for minimizing these 
subclasses of submodular functions using a max-fiow algorithm. 

Definition 1. Submodular functions map f : B v — »■ K and satisfy the following condition: 

f(X) + f(Y)>f(XVY) + f(X AY) (1) 
where X and Y are elements ofW 1 

In this paper, we use a pseudo-Boolean polynomial representation for denoting submodular functions. 

Definition 2. Pseudo-Boolean functions (PBF) take a Boolean vector as argument and return a real number, 
i.e. f : B™ — > R [2]. These can be uniquely expressed as multi-linear polynomials i.e. for all f there exists 
a unique set of real numbers {as ■ S G M N } : 

f(x 1 ,...,x n ) = ^2 a s(Y[xj),a s el, (2) 

scv jes 

where a® is said to be the constant term. 

The term order refers to the maximum degree of the polynomial. A submodular function of second order 
involving Boolean variables can be easily represented using a graph such that the minimum cut, computed 
using a max-fiow algorithm, also efficiently minimizes the function. However, max-fiow algorithms can not 
exactly minimize non-submodular functions or some submodular ones of an order greater than 3 [30]. There 
is a long history of research in solving subclasses of submodular functions both exactly and efficiently using 
max-fiow algorithms [1, 16,8,29,24]. In this paper we propose a novel linear programming formulation 
that is capable of definitively answering this question: given any pseudo Boolean function, it can derive a 
quadratic submodular formulation of the same cost, should one exist, suitable for solving with graph-cuts. 
Where such a quadratic submodular formulation does not exist, it will find the closest quadratic submodular 
function. 

Let F k denote the class of submodular Boolean functions of order k. It was first shown in [8] that any 
function in T 2 can be minimized exactly using a max-fiow algorithm. In [1, 16], showed that any function in 
J 73 can be transformed into functions in T 2 and thereby minimized efficiently using max-fiow algorithms. 
The underlying idea is to transform the third order function to a function in F 2 using extra variables, which 
we refer to as auxiliary variables (AV). In the course of this paper, you will see that these AVs are often 
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more difficult to handle than variables in the original function and our algorithms are driven by the quest to 
understand the role of these auxiliary variables and to eliminate the unnecessary ones. 

Recently, Zivny et al. made substantial progress in characterizing the class of functions that can be 
transformed to T 2 . Their most notable result is to show that not all functions in F A can be transformed to a 
function in T 2 . This result stands in strong contrast to the third order case that was positively resolved more 
than two decades earlier [1]. Using Theorem 5.2 from [23] it is possible to decompose a given submodular 
function in F A into 10 different groups Qi,i = {1..10} where each Qi is shown in Table 1. Zivny et al. 
showed that one of these groups can not be expressed using any function in T 2 employing any number of 
AVs. Most of these results were obtained by mapping the problem of minimizing submodular functions to a 
valued constraint satisfaction problem. 

1.1 Problem Statement and main contributions 

Largest subclass of submodular functions We are interested in transforming a given function in T k into a 
function in T 2 using AVs. As such a transformation is not possible for all submodular functions of order four 
or more [30], our goal is to implicitly map the largest subclass T\ that can be transformed into T 2 . This 
distinction between the two classes T k and T will be crucial in the remainder of the paper (see Figure 1). 




Fig. 1. All the function in the classes J- , J- 2 , J- 3 and k > 2 can be transformed to functions in J- 2 and minimized 
using the maxflow/mincut algorithm. 

Definition 3. The class T\ is the largest subclass of T k such that every function /(x) 6 T k has an equiva- 
lent quadratic function /i(x, z) £ T 2 using AVs z = z\, Z2, z m <E B m satisfying the following condition: 

/(x) = mm ft,(x, z), Vx. (3) 

In this paper, we are interested in developing an algorithm to transform every function in this class T k to a 
function in T 2 . 

Efficient transformation of higher order functions: We propose a principled framework to transform higher 
order submodular functions to quadratic ones using a combination of monotonic Boolean functions(MBF) 
and linear programming. This framework provides several advantages. First we show that the state of an 
AV in a minimum cost labeling is equivalent to an MBFdefined over the original variables. This provides an 
upper bound on the number of AVgiven by the Dedekind number [17], which is defined as the total number 
of MBFs over a set of n binary variables. In the case of fourth order functions, there are 168 such functions. 
Using the properties of MBFs and the nature of these AVs in our transformation, we prove that these 168 AVs 
can be replaced by two AVs. 



4 Srikumar Ramalingam, Chris Russell, Lubor Ladicky and Philip H.S. Torr 



Minimal use ofAWs: One of our goals is to use a minimum number(m) of AVs in performing the transforma- 
tion of (3). Although, given a fixed choice of F k , reducing the value of m does not change the complexity 
of the resulting min/cut algorithm asymptotically, it is crucial in several machine learning and computer 
vision problems. In general, most image based labeling problems involve millions of pixels and in typical 
problems, the number of fourth order priors is linearly proportional to the number of pixels. Such problems 
may be infeasible for large values of m. A recent work shows that the transformation of functions in T% 
using about 30 additional nodes [31]. On the other hand, we show that we can transform the same class of 
functions using only 2 additional nodes. Note that this reduction is applicable to every fourth order term in 
the function. A typical vision problem may involve functions having 10000 terms for an image of size 
100 x 100. Under these parameters, our algorithm will use 20000 AVs, whereas the existing approach [31] 
would use as large as 300000 AVs. In several practical problems, this improvement will make a significant 
difference in the running time of the algorithm. 



1.2 Limitations of Current Approaches and Open Problems 

Decomposition of submodular functions: Many existing algorithms for transforming higher order functions 
target the minimization of a single A: -variable fc th order function. However, the transformation framework is 
incomplete without showing that a given n-variable submodular function of fc th order can be decomposed 
into several individual fc-variable fc th order sub-functions. Billionet proved that it is possible to decompose a 
function in F 3 involving several variables into 3-variable functions in F 3 [1]. To the best of our knowledge, 
the decomposition of fourth or higher order functions is still an open problem. We believe that this problem 
will be to resolve as, in general, determining if a fourth order function is submodular is co-NP complete 
[6]. Given this, it is likely that specialized solvers based on max- flow algorithms may never solve the gen- 
eral class of submodular functions. However, this decomposition problem is not a critical issue in machine 
learning and vision problems. This is because the higher order priors from natural statistics already occur 
in different sub-functions of fc nodes - in other words, the decomposition is known a priori. This paper only 
focuses on the transformation of a single fc-variable function in T k . As mentioned above, the solution to this 
problem is still sufficient to solve large functions with hundreds of nodes and higher order priors in machine 
learning and vision applications. 

Non-Boolean problems: The results in this paper are applicable only to set or pseudo-Boolean functions. 
Many real world problems involve variables that can take multiple discrete values. It is possible to convert 
any submodular multi-label second order function to their corresponding QBF [9, 26]. One can also trans- 
form any multi-labeled higher order function (both submodular and non-submodular) to their corresponding 
QBF by encoding each multi-label variable using several Boolean variables [25]. 

Excess AVs: The complexity of an efficient max-flow algorithm is 0((n + m) 3 ) where n is the number of 
variables in the original higher order function and m is the number of AVs. Typically in imaging problems, 
the number of higher order terms is of 0(n) and the order k is less than 10. Thus the minimization of 
the function corresponding to an entire image with 0(n) higher order terms will still have a complexity 
of 0((n + n) 3 ). However when m becomes at least quadratic in n, for example, if a higher-order term is 
defined over every triple of variables in V, the complexity of the max-flow algorithm will exceed that of 
a general solver being 0((n + n 3 ) 3 ). Thus in applications involving a very large number of higher order 
terms, a general solver may be more appropriate. 
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2 Notation and preliminaries 

In what follows, we use a vector x to denote {xi,x 2 , £3, x n }. Let B denote the Boolean set {0, 1} and R 
the set of reals. Let the vector x = (xi, x n ) e B™, and V = {1, 2, n} be the set of indices of x. Let 
z = (z\, z 2 , Zk) € B fc denote the AVs. We introduce a set representation to denote the labellings of x. Let 
S4 = {1,2,3,4} and let V be the power set of S4. For example a labeling {x\ = 1, X2 = 0, X3 = 1, X4 = 1) 
is denoted by the set {1, 3, 4}. 

Definition 4. The (discrete) derivative of a function f(x\, . . . , x n ) with respect to Xi is given by: 
8 f 

— (xi, . . . ,x n ) = f(x 1 , . . .,Xi-i, l,x i+1 , ...,x n )- f(xi, . . . ,Xi-i,0,x i+ i, ...,x n ) (4) 

Definition 5. The second discrete derivative of a function Aij (x) is given by 
5 5 f 

J - ixi -iW" 1 (5) 

= ^f(xi,...,Xi-l,l,Xi+l,Xj-l,l,Xj + l...,X n ) — f(xi,...,Xi-lfi,Xi + l,Xj-l,l,Xj + l...,X n ')^ 

— ^f(xi,...,Xi — i,l,Xi+iiXj — i,0,Xj+i...,x n ) — f(xi,...,Xi — i,0,Xi+i,Xj-i,0,Xj+i...,x n ^. 

Note that it follows from the definition of submodular functions (1), that their second derivative is always 
non-positive for all x 



3 Transforming functions in T% to T 2 

Consider the following submodular function /(x) e T 2 represented as a multi-linear polynomial: 

/(x)= asiUx^^seR (6) 

Let us consider a function /i(x, z) e T 2 where z is a set of AVs used to model functions in T 2 - Any 
general function in F 2 can be represented as a multi-linear polynomial (consisting of linear and bi-linear 
terms involving all variables): 

M x , z ) = a ' 1 Xl ~ a%,jXiXj +Y a l z l- Y a^ m z lZm ~Y a i,i x i z i ( ? ) 

i i.j:i>j I l.va:l>m i,l 

The negative signs in front of the bi-linear terms (xiXj, Z[X i} ziz m ) emphasize that their coefficients 
(—ciij, — an, — ai m ) must be non-positive if the function is submodular. We are seeking a function h such 
that: 

/(x) = rnm/i(x,z),Vx. (8) 

Here the function /(x) is known. We are interested in computing the coefficients a, and in determining 
the number of auxiliary variables required to express a function as a pairwise submodular function. The 
problem is extremely challenging due to the inherent instability and dependencies within the problem - 
different choices of parameters cause auxiliary variables to take different states. To explore the space of 
possible solutions fully, we must characterize what states an AV takes. 
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3.1 Auxiliary Variables as Monotonic Boolean Functions 

Definition 6. A monotonic (increasing) Boolean function (MBF) m : B" — > B takes a Boolean vector as 
argument and returns a Boolean, s.t if yi < Xi^i =>■ m(y) < m(x) 

Lemma 1. The function zs(x) defined as x by 

z s (x) = argmin (min/i(x, z',z s ) ) . (9) 

2, V z' / 

/.e. f/zaf ma/?5 from x to f/ze Boolean state of z s is an MBF (See Definition 6), where z' is the set of all 
auxiliary variables except z s . 

Proof. We consider a current labeling x with an induced labeling of z s = z s (x). We first note 

h'(x, z s ) = min/i(x, z', z s ) (10) 

is a submodular function i.e. it satisfies (1). We now consider increasing the value of x, that is given a current 
labeling x we consider a new labeling xW such that 

,W J 1 if -? =i (ii) 
3 1 otherwise. 

We wish to prove 

z s (xW)> Zs (x)Vx,t (12) 

Note that if z s (x) = or Xi — 1 this result is trivial. This leaves the case: z s (x) = 1 and Xi = 0. It follows 
from (5) that: 

h!(x\, . . .,Xi-!, l,x l+1 , ...,0)-h'(xi,.. .,Xi-i,0,x i+ i, . . . , 1) > (13) 
h'(x\, . . .,Xi-i, l,x l+1 , ...,l)-h'(xi,.. .,x i - 1 ,0,x i+1 , ... ,0). 

As, by hypothesis, z s (x) = 1 and Xi = we have: 

h'{x\, Xi-i,0, x i+ i, . . . , 0) > h'(x!, . . . , Xi-i, 0, . . . , 1). (14) 

Hence 

/i'(a;i,...,a;j_i,l,a; i+ i,...,0) - h'(a:i, . . . ,^-1, 0, x i+1 , . . . , 0) > (15) 
h'(x\, . . .,Xi-!, l,x l+1 , ...,l)-h'(xi,.. .,Xi-i,0,x i+ i, ... ,0), 

and 

ti (x!,..., Xj_i,l,a; i+ i,...,0) > h'(a;i,...,a;i_i,l,ar i+ i,...,l). (16) 

Therefore z,(xW) = 1. Repeated application of the statement gives j/i < Xj,Vi z s(y) < z s (x) as 

required □ 

Definition 7. T/ze Dedekind number M(n) is the number o/MBFs o/n variables. Finding a closed-form 
expression for M(n) is known as the Dedekind problem [14, 17]. 
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The Dedekind number of known values are shown below: M(l) = 3, this corresponds to the set of functions: 

Mi (an) e {0,1,^}, (17) 

where and 1 are the functions that take any input and return or 1 respectively. M(2) = 6 corresponding 
to the set of functions: 

M 2 (xi,x 2 ) = {0, 1, Xi,X2,Xi V X2, x\ A x 2 } (18) 

Similarly, M(3) = 20, M(4) = 168, M(5) = 7581, M(6) w 7.8 x 10 6 , M(7) w 2.4 x 10 12 , and 
M(8) w 5.6 x 10 23 . For larger values of n, M(n) remains unknown, and the development of a closed form 
solution remains an active area of research. 

Lemma 2. On transforming the largest graph-representable subclass of k' h order function to pairwise 
Boolean function, the upper bound on the maximal number of required AVs is given by the Dedekind number 
M(k). 

Proof. The proof is straightforward. Consider a general multinomial, of similar form to equation (6), with 
more than M(k) AVs. It follows from lemma 1 that at least 2 of the AVs must correspond to the same 
MBF, and always take the same values. Hence, all references to one of these AV in the pseudo-Boolean 
representation can be replaced with references to the other, without changing the associated costs. Repeated 
application of this process will leave us with a solution with at most M{k) AVs. □ 

Although this upper bound is large for even small values of k, it is much tighter than the existing upper bound 
of S(k) = 2 2 (See Proposition 24 in [32]). For even small values of k = {3, 8} the upper bound using 
Dedekind's number is much smaller: (M(3) = 20, 5(3) = 256) (M(4) = 168, 5(4) = 65536), (M(5) = 
7581, 5(5) w 4.29xl0 9 ),(M(6) w 7.8xl0 6 ,5(6) w 1.85x 10 19 ), (M(7) w 2.4x 10 12 , 5(7) w 3.4xl0 38 
and (M(8) ss 5.6 x 10 23 , 5(8) w 1.156 x 10 77 ). Zivny et.al. have emphasized the importance of improving 
this upper bound. In section 5, we will further tighten the bound for fourth order functions. 

Note that this representation of AVs as MBF is over-complete, for example if the MBF of a auxiliary 
variable z% is the constant function Zj(x) = 1 we can replace min z z . h(x, z, Zi) with the simpler (i.e. one 
containing less auxiliary variables) function min z h(x, z, 1). Despite this, this is sufficient preliminary work 
for our main result: 

Theorem 1. Given any function f in F 2 , the equivalent pairwise form /' € T 2 can be found by solving a 
linear program. 

The construction of the linear program is given in the following section. 

4 The Linear Program 

A sketch of the formulation can be given as follows: In general, the presence of AVs of indeterminate state, 
given a labeling x makes the minimizing an LP non-convex and challenging to solve directly. Instead of 
optimizing this problem containing AVs of unspecified state, we create an auxiliary variable associated with 
every MBF. Hence given any labeling x the state of every auxiliary variable is fixed a priori, making the 
problem convex. We show how the constraints that a particular AV must conform to a given MBF can be 
formulated as linear constraints, and that consequently the problem of finding the closest member of /' e T 2 
to any pseudo Boolean function is a linear program. 
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This program will make use of the max-flow linear program formulation to guarantee that the minimum 
cost labeling of the AVs corresponds to their MBFs. To do this we must first rewrite the cost of equation (7), 
in a slightly different form. We write: 

/(X,z) = C0 +^Ci,s (1 - Xi) +^2ct,i x i + C hO X i( 1 ~ X i) 

+ y] (i - zi)+y~] ct,i (i - zi) + c^ m z t (i - z m ) + xj (i - zi) (i9> 

I / l,m:l>m i,l 

where c% is a constant that may be either positive or negative and all other c are non-negative values referred 
to as the capacity of an edge. By [16, 1], this form is equivalent to that of (7), in that any function that can 
be written in form (7), can also be written as (19) and visa versa. 



4.1 The Max-flow Linear Program 

Under the assumption that x is fixed, we are interested in finding a minima of the equation: 

/x(z) = C0 + ^c ii5 (l-a;j) + ^c M 2; i + ^ c itj x^l - xj) 



i,y.i>] 



^2ci,s (1 - Zi) +J^C M (1 - Zi) + Y c l,mZl(l- Z m ) + ^2ci t iXi(l- Z{) 

I I l,m:l>7n id 

- d xfi +^d x ,z, s (1 - z{) +^rf x ,t,; (1 - z{) + Y d xj ; mZ| (1 - z m ) (20) 



' ,m:l>m 



where 



i-.Xi—O i:Xi—l iJ:i>jAXi — lAXj—0 

d x , s ,i = c s .i + Ci.i, d x ,i tt = ci.t and d^ Lm = ci tTn . (22) 

i:Xi — 1 

Then the minimum cost of equation (19) may be found by solving its dual max-flow program. Writing V X!;S 
for flow from sink, and V x ,t f° r f° w to the sink, we seek 

max V X!;S + d x ,0 (23) 

Subject to the constraints that 

/x,ij - d x .ij < V(i, j) e E 

J2j:(jA)£E /x,ji _ J2j:(iJ)eE fx,ij - ® V« 7^ S, t 
^x,s + 2~2j:(j,s)eE /xjs - 2~2j;(s,j)£E f*,sj < (24) 
V x ,i + J2j:(j,t)£E fx,jt - 2~2j:(t,j)eE /x,*i < 

/ x>y >0 (i,j)eE 

where E is the set of all ordered pairs (I, m) : VZ > m, (s, I) : VZ and (Z, t) : Vi, and / Xj ij corresponds to 
the flow through the edge (i, j). 

We will not use this exact LP formulation, but instead rely on the fact that / x (z) is a minimal cost labeling 
if and only if there exists a flow satisfying constraints (24) such that 

/x(z) - V x , s - rf x , < 0. (25) 



Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions 9 
4.2 Choice of mbf as a set of linear constraints 

We are seeking minima of a quadratic pseudo Boolean function of the form (19), where x is the variables we 
are interested in minimizing and z the auxiliary variables. As previously mentioned, formulations that allow 
the state of the auxiliary variable to vary tend to result in non-convex optimization problems. To avoid such 
difficulties, we specify as the location of minima of z as a set hard constraints. We want that: 

rnin/ x (z) = / x ([mi(x), m 2 (x), . . .m M(fe) (x)]) Vx. (26) 

where f x is defined as in (20), and mi,... rriM(k) are the set of all possible MBFs defined over x. By setting 
all of the capacities di.j to 0, it can be seen that a solution satisfying (26) must exist. It follows from the 
reduction described in lemma 1, and that all functions that can be expressed in a pairwise form can also be 
expressed in a form that satisfies these restrictions. 

We enforce condition (26) by the set of linear constraints (24) and (25) for all possible choice of x. 
formally we enforce the condition 

/ x ([toi(x), . . . ,m M(fe )(x)) - V x , s - d x> < 0. (27) 

Substituting in (20) we have 2 k sets of conditions, namely, 

^d x ^ s (l-m ; (x)+^(i x , M (l-m ; (x)) + ^2 rf x,;,m mj(x) (1 - m m (x)) - V x , s < 0, (28) 

I / l,m:l>m 

subject to the set of constraints (24) for all x. Note that we make use of the max-flow formulation, and not 
the more obvious min-cut formulation, as this remains a linear program even if we allow the capacity of 
edges d 1 to vary. 



Submodularity Constraints We further require that the quadratic function is submodular or equivalently, the 
capacity of all edges Cij is non-negative. This can be enforced by the set of linear constraints that 

<;., •().'/..,. (29) 



4.3 Finding the nearest submodular Quadratic Function 

We now assume that we have been given an arbitrary function g(x) to minimize, that may or may not lie in 
T k . We are interested in finding the closest possible function in T 2 to it. To find the closest function to it 
(under theii norm), we minimize: 



1 In itself d is just a notational convenience, being a sum of coefficients in c. 
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min^T 


ff(x) 


xeB fc 






5(x) 


xeB fc 




min 


5(x) 


xeB fc 





in/(x,z) = (30) 

Z 

x,m(x))|= (31) 

a + Cj, 8 (i - Xj) + y^ Cf,j £j + y^ ^^(i-^) (32) 

i i ij:i>j 

+ ^2ci, s (l-mi(x.)) + ^2ct,i(l-mi(x))+ ^ cj, m m;(x) (1 - m m (x)) 

/ / l,m:l>m 

+ y^c ij( a; i (l-m;(x)))| 

where m(x) = [mi(x), . . . , m M ( fe )(x)] is the vector of all MBFs over x, and subject to the family of 
constraints set out in the previous subsection. Note that expressions of the form ^\ \9i\ can be written as 
J2i hi subject to the linear constraints hi > gi and hi > —gi and this is a linear program. □ 



4.4 Discussion 

Several results follow from this. In particular, if we consider a function g of the same form as equation (2) 
the set of equations such that 



(33) 



m c in p x ) ~ m z in /( x ; z ) 

x£B fc 

exactly defines a linear polytope for any choice of |x| = k, and this result holds for any choice of basis 
functions. 

Of equal note, the convex-concave procedure [28], is a generic move-making algorithm that finds local 
optima by successively minimizing a sequence of convex (i.e. tractable) upper-bound functions that are tight 
at the current location (x'). [21] showed how this could be similarly done for quadratic Boolean functions, by 
decomposing them into submodular and supermodular components. The work [18] showed that any function 
could be decomposed into a quadratic submodular function, and an additional overestimated term. Never- 
theless, this decomposition was not optimal, and they did not suggest how to find a optimal overestimation. 
The optimal overestimation which lies in T 2 for a cost function defined over a clique g may be found by 
solving the above LP subject to the additional requirements: 

3(x)</(x,z)Vx (34) 
<7(x')>/(x',z) (35) 

Efficiency concerns As we consider larger cliques, it becomes less computationally feasible to use the 
techniques discussed in this section, at least without pruning the number of auxiliary variables considered. 
As previously mentioned, constant AVs and AVs that corresponds to that of a single variable in x i.e. z ; = Xi 
can be safely discarded without loss of generality. In the following section, we show that a function in J 7 ^ 
can be represented by only two AVs, rather than 168 as suggested by the number of possible MBF. However, 
in the general case a minimal form representation eludes us. As a matter of pragmatism, it may be useful to 
attempt to solve the LP of the previous section without making use of any AV, and to successively introduce 
new variables, until a minimum cost solution is found. 
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5 Tighter Bounds: Transforming functions in T% to F" 1 

Consider the following submodular function f(x\, x 2 , x 3 , 2,4) £ T 4 represented as a multi-linear polyno- 
mial: 

f{xi,X2,Xs,Xj) = ao+y^ j a i Xi + y^ j a i jXiXj+ ^ a ljk XiX j x k + a 123 4X 1 x 2 x 3 X4, Aj( x ) - ( 36 ) 

i i>j i>j>k 

where i, j, k = S4 and Aij (x) is the discrete second derivative of /(x) with respect to Xi and Xj. e 

Consider a function h(x\, x 2 , x 3l X4, z s ) e T 2 where z s is an AV used to model functions in J 74 . Any 
general function in T 2 can be represented as a multi-linear polynomial (consisting of linear and bilinear 
terms involving all five variables): 

4 

h(x 1} X2,x 3 ,x 4 ,z s ) = 6 + S ^b i x l - ^bijX.Xj - (g s - y^g s ,iXj)z s , b l3 > 0,g s j > 0,i,j G S 4 - 

i i>j i—1 

(37) 

The negative signs in front of the bilinear terms (xiXj,z s Xi) emphasize that their coefficients (— bij, — g s ^) 
must be non-positive to ensure submodularity. We have the following condition from equation (3): 

f(xi 1 x 2 ,x 3 ,X4) = mmh(xi,x 2 ,x 3 ,X4, z s ),Vx. (38) 

z s GB 

Here the coefficients (a^, a^, a,jfe, a^;) in the function /(x) are known. We wish to compute the coeffi- 
cients (6j, bij 7 g S7 g Sin ) where i, j e V, i ^ j,n e 5*4. If we were given (g s ,g s ,i) then from equations (37) 
and (38) we would have 

Zs = j 1 if 9s - Eti 9s,iXi < 0, (39) 
1 otherwise. 

The value of z s that minimizes equation (38) is dependent both upon the assignment of {x\, x 2 , x 3l X4] and 
upon the coefficients (g s , g s _i, g Sj2 , g s<3 , g s ^). The four variables xi,x 2 ,x 3 and x 4 can be assigned to 16 
different labellings of (x\, x 2l x 3 , X4) giving 16 equations in the following form: 

4 

f{x 1 ,x 2 ,x 3 ,x 4 ) = h(x 1 ,x 2 ,x 3 ,x 4 ,0) + min(g s - V ]g s ,iXi)z s (40) 

v / z 3 eB ' 

7 t—i 



The function hi is the part of h not dependent on z s , and h 2 is the part dependent on z s . Our main result 
is to prove that any function h e T 2 can be transformed to a function h'(xi, x 2 , X3, X4, Zj\, Zj 2 ) G J- 2 
involving only two auxiliary variables Zj\ and zj 2 . Using this result we can transform a given function 
f(xi,x 2 ,x 3 , X4) e T 2 , the form of which we characterize later, to a function h'{x\, x 2 , x 3 , x 4 , Zj\, Zj 2 ) € 
T 2 . 

Let A be the family of sets corresponding to labellings of x such that:z s = = arg min^ h(x, z s ). In the 
same way let B be the family of sets corresponding to labellings of x such that:z s = 1 = arg min Zs /i(x, z s ). 
These sets A and B partition x, as defined below: 

Definition 8. A partition divides V into sets A and B such that A = {<5(x) : = arg min ze B h(x, z), x € 
B 4 } and B = V\A. Note that e A. 
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Fig. 2. Hasse diagrams sample partitions. Here, we use set representation for denoting the labellings 
of (x\, X2, xz, 004), For example the set {1,2,4} is equivalent to the labeling {x± = l,x% = 
l,xa = 0, Xi = 1}. In (a), A = {{}, {2}, {3}, {4}, {2, 3}, {2, 4}, {3, 4}, {2, 3, 4}} and B = 
{{1}, {1, 2}, {1, 3}, {1, 4}, {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, S4}. (a) and (b) are examples of partitions. On searching the 
space of all possible partitions (2 16 ) we found that only 168 partitions belong to this class. These are the only partitions 
which will be useful in our analysis because any arbitrary AV must be associated with one of these 168 partitions. (See 
text for the relation between these partitions and MBF s). 

In the rest of the paper, we say that the AV z s is associated with [A, B] or denote it by z s : [A, B]. We 
illustrate the concept of a partition in figure 2. 

From lemma 2, we could use 168 different AVs in our transformation. However, we show that the same 
class can be represented using only two AVs. In other words, all existing partitions could be converted to 
these two reference partitions represented by two AVs taking the states shown below. 

Definition 9. The forward reference partition [Af, Bf] takes the form: 

B e B f <^> \B\>3,A f =V\B f (41) 
On the other hand, a backward reference partition [.Af,, £>&] is shown below: 

BeB b ^ \B\> 2, At = V\B b (42) 

The forward and backward reference partitions are shown in figure 3. Note that these reference partitions 
satisfy the properties of a matroid. Here we treat A as the family of subsets of the ground set S4. More 
specifically, these reference partitions satisfy the conditions of a uniform matroid (see appendix). 

We approach this problem by first considering the simplified case in which no interactions between AVs 
are allowed. This is covered in section 5.1, while section 5.2 builds on these results to handle the case of 
pairwise interactions between AV. 

5.1 Non-interacting avs 

Here we study the role of AV independently. In other words, we don't consider the interaction of AVs that 
involve bilinear terms such as ZiZj. The following lemmas and theorems enable the replacement of AVs with 
other AVs closer to the reference partitions. By successively applying replacement algorithms, we gradually 
replace all the AVs using with the two AVs in forward and backward reference partitions. 
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{} 




Fig. 3. The two matroidal generators used to represent all functions in J 7 ^, Note that the bilinear term ZjiZj2 is active, 
i.e. Zj\Zji = 1, in the region of overlap. 

Lemma 3. Let z s : [A s , B s ] be an AV in a function h(x, z s ) in T 2 , then h can be transformed to some 
function ft.'(x, z t ) in T 2 involving z t : [At, Bt], such that for all B G Bt, \B\ > 2. 

Proof. We say that a function h can be transformed to h' if min Zs h(x, z s ) = min2 t h'(x, z t ), Vx. It does 
not imply that h(x, z s ) — /i'(x, Zt), Vx. We first consider the case where 6 B s . If this is the case, 
argmiix^ /i'(x, Zf) = Vx. Hence we can transform /i(x, z s ) to h'(x) and the lemma holds trivially. Next 
we assume that there exists a singleton {e} € B s , i.e. {e} is {1},{2},{3} or {4}. We decompose h as: 



4 



minh(xi, x 2 ,x 3 , X4,,z s ) = hi(xi,x 2 ,x 3 ,x 4l ) +mm(g s 




i=i 



where h 2 is the part of h dependent on z s . 




i—S^Xe 



As (e) £ B s , g s — g s , e < 0. As a result, z s ~ 1 when x e — 1, i.e. x e 
equation we replace x e z s using simply x e to obtain the following: 




mmh 2 = mm((g s - g St e)x e + {g s - g s x e 



^ ^ Qs.iXi) Z s ^j . 
i=S4\e 



The decomposition of the original function can then be written, replacing z s by Zf. 



h' = h 1 + (g s ~ g s , e )x e ) + (g s - g s x e 



9s,iXi)Z t ■ 

i=S&\e 



A sample reduction for this lemma is shown in figure 4. Note that h' 2 equals for the singleton {e}. 
Similarly any other singleton {e'} can also be removed from B s using the same approach. After repeated 
application, our final partition, B t does not contain any singletons. □ 
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Fig. 4. An example of lemma 3. The AV z s is replaced by Zt and the associated partitions [A s ,B a ] and [At,Bt] are 
shown in (a) and (b) respectively. The initial and the final set of parameters are given by:(g a = 3,g s ,i ~ 4,g s ,2 = 
l,g s ,3 = l,g s ,4 — 1), (gt = 3,gt,i = 3,(^,2 = Lst,3 = Ldt>4 = 1). In the initial partition we have the singleton 
{1} £ B s . After the transformation all the singletons {e} G At- 

Lemma 4. Any function h{~x.,z s ) in J- 2 with z s associated with the partition [A S ,B S ] satisfying the con- 
dition B s C Bf can be transformed to some function ft'(x, zf) in T 2 with Zf belonging to the forward 
reference partition [Af, Bf]. The same result holds for backward partition. 

Proof. The proof is by construction. Let the parameters of the partition [A s , B s ] be 

(g s ,g s ,l,gs,2,g s ,3,9s,4,)' Our goal is to compute a new set of parameters (g f , g fA , g f>2 , 3/, 3 , corre- 
sponding to the forward reference partition such that the associated functions keep the same value at the 
minimum: 



min/i (x, zr) = min/i(x, z s ), Vx 

Zf z s 

min(/i' 1 (x) + /i' 2 (x, zA) — min(/ii(x) + /i2(x, z s )), Vx 

Zf z s 

min(/i 2 (x, Zf)) — min(/i 2 (x, z s )), Vx 



(43) 
(44) 
(45) 



We can rewrite h 2 and h' 2 using k function: 



minK(/, S)zf = minK(s, S)z s ,VS G V 



(46) 



By substituting the values of z s and Zf for all S 1 G "P we obtain five equations with five unknowns 
(gf, , g f,2 , 3/,3 j fl/,4) • We rewrite the equations as: 



/l -1 -1 -l\ 
1 -1 -1 -1 
10-1-1 -1 
1-10-1 -1 

\1 -1-1 -1-1/ 



(Bt\ 

9}; .1 

S/,3 



/min(0, K (s,{2,3,4}))\ 
min(0,K(s,{l,3,4})) 
min(0,/s(s,{1.2,4})) 
min(0,/s(s,{l,2,3})) 

V min(0, k(s, S4)) / 



(47) 
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The solution to the above linear system is unique because H is of rank 5. Now we show that the solution sat- 
isfies submodularity condition and corresponds to the forward reference partition. Submodularity is ensured 
by the constraint that the parameters (<7/,i, 37,2, g/,3, 57,4) are all non-negative. Using equation (47) and the 
non-negativity of original variables (g s .i) we obtain the following: 

gf t i = min(0, k(s, S4V)) — min(0, k(s, S4)) (48) 
k(s,5 4 ) < K{s,S 4 \i) (49) 

From these equations we can show that gfj is always non-negative: 

if k(s, S4) > and S 4 \i) > 
g f i = ^ -k(s, S 4 ) if k(s, Si) < and k(s, S 4 \i) = (50) 

k(s. S4) if k(s, S4) < and k(s, S 4 \i) < 

We now prove that the computed parameters correspond to the forward reference partition: 

sJ B > if f'^ (5,) 

1^1/ otherwise 

From equation (47) it follows that any set S, such that l^l > 3, exists in Bf. We need to prove the remaining 
case where \S\ < 3. To do this, we consider S — {i, j} — S4\{k, 1} and examine its partition coefficients: 

«(/) {hj}) = K{f,{h3, k ))+9f,k 

lh j}) = «(/, {i, J, k}) + ((«(/, {i,j, I}) - n(f, {i, j, k, I}) 

= min(0,K(s, +min(0,/t(s,{i,i,/})) - min(0, k(s, {i, j, k, I})) 



As in table 2 (see appendix), «(/, {i, j}) has four possible values and «;(/, {z, j}) > in all. As each set 
S : \S\ — 2 exist in .4/, every other set with a cardinality less than two must also exist in Af. Hence, for 
every partition A s , B s satisfying B s C Bf, we can compute an equivalent reference partition [Af, Bf]. □ 



Lemma 5. Let P = {i,j, k, 1} = S4 and let z s be the auxiliary variable in /i(x, z s ) associated with the 
partition [A s , B s ]. If both A and B — P\A are elements ofB s , then it is not possible to have both C and 
D = P\C in A s . 

Proof. The statement follows by contradiction. Let {A, B}, where B = P\A, exist in B s . The partition 
coefficients of A and B with respect to Z\ are shown below: 

4 

K (s,A)=g s -J2l? <0 
»=i 
4 

K{s,B)=g 8 -Y,li <0 

i=l 

Note that A U i? = {i, j, fc, and A n B = 0. Hence by summing the above equations we get the following: 

2.g s - 5 s ,i - g s ,j - g s ,k - g s ,i < (52) 
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{3,4} 




{12} 



{1.3} 



{1,4} 



(a) 




{1.2,3} 



{1.2,4} 



{1.3,4} 



{2,3,4} 




{1,2,3} 



{1,2} 



{1,3} 



{2,3} 



(b) 




{2,4} 



{3,4} 




(C) 



{1,2,3} 



{1,2,4} 



{1,3,4} 



{2,3,4} 




{1,2,3,4} 




{2,3,4} 



(d) 



(g) 




B, 



Fig. 5. Examples for the four cases in tables 3, 4, 5 and 6. In the first case the transition in (a) is mapped to that in (b) and 
the associated parameters are given by: ((g a — 6,g s ,i = 1,<? S ,2 = lj<?s,3 — l 5 ffs.4 = l)i(<7i — 5jfft,l — l>fl*,2 = 
2 ; ffi,3 = 2, gt,i = 3)). The generated pairwise term, independent of AWs, is —X3X4. The second case is in (c) and 
(d) with the parameters ((5, 1, 2, 3, 4, 5), (2, 1, 1, 1, 1)) (shown in the same order as the earlier one) and the pairwise 
function is —X2X4 — 2x3X4. The third case is in (e) and (d) with the parameters ((5, 4, 1, 1, 1), (2, 1, 1, 1, 1)) along with 
the pairwise function —X\X2 — X1X3 — X1X4. The final case is in (f), (g) and (h), as the final function has two AVs Z2 and 
Z3. The function consisting of unary and pairwise terms independent of ANs is given by 1 — X\~X2 — X3 ~x\X3 — 1x2X3. 
Corresponding parameters are given by: ((g 3 = 8,g s ,i = 4,g 3>2 = 5,g s , 3 = 6,g s ,<i = 0),(g t = 4, # M = 2,g t}2 = 
2, gt,3 = 2, g t ,4 = 0), (g r = 2, g r ,i = 1, g Tt2 = 1, g r ,3 = 1, g r ,4 = 0)) 
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Assume now that a different pair {C, D}, where D = P\C exist in A s . By summing their corresponding 
partition coefficients we get the following equation: 

25s - 9s,i - 9s,j - 9s,k - 9s,i > 0, (53) 

Equations 52 and 53 lead to a contradiction, therefore the lemma holds . □ 

Theorem 2. Any function h(x,z s ) in F 2 with z s associated with [A S ,B S ], such that\/B E B s , \B\ > 2, 
can be transformed to another function h" (x, Zf,Zb) in T 2 without any zjZb terms, where Zf and Zb are AV 
correspond to the forward and backward reference partitions respectively. 

Proof. Our proof by construction takes the form of a two-step procedure. In the first stage every function 
h(x,z s ) is transformed to h' (x, z t , z r ) where z t and z r are associated with the partition [_4 t ,£> t ] and the 
backward partition [_4,.,£>,.] respectively and satisfy the conditions B t C Bf and B r C £> b . In the second 
step we use lemma 4 to transform h'(x, z t ,z s ) to /i"(x, Zf,Zb).ln most cases only one partition, either the 
forward or the backward, is used. 

min/i 2 (x, z s ) — min k(s, S)z s yS E V (54) 

4 4 4 

^ a,iXi + ^ ^ aijXiXj + min S)z t + min n(r, S)z r ,VS E V (55) 

i=l i=l 

The key idea is to decompose hi into functions of unary and pairwise terms involving only x and functions 
involving new auxiliary variables z t and z r . Consider the condition \B\ > 2. A degenerate case occurs where 
\B\ > 3; here we can directly use lemma 4 to obtain our desired result. We now consider the cases where 
at least one set S e B s has cardinality two and show a transformation similar to the general one of (55). 
Tables 3, 4, 5 and 6 in the appendix contain details of the decomposition. 

After the decomposition the new partitions [_4 t ,£> t ] and [A r ,B r ] satisfy the conditions Bt C Bf and 
B r C Bb- To show this, we first consider the case where exactly one set S E B s has a cardinality of 2. There 
are six such occurrences, and all of them are symmetrical. The transformation for this case is in table 3. 

Next, consider the case where exactly two sets of cardinality two exist in B s . Although there are 15 
((2)) possible cases, they must all be of the form {{i, j}, {k, I}} or {{i, j}, {j, k}}. The first sub-case is 
prohibited because the presence of the mutually exclusive pair {{i,j},{k, I}} would not permit any other 
mutually exclusive pair {{i, k}, {j, I}} to exist in A s as per lemma 5. The transformation for the latter case 
is in table 4. 

Finally, consider the case where exactly three sets of cardinality two exist in B s . The 20 different occur- 
rences ((3)) can be expanded to three different scenarios:{{i, j}, {i, k}, {i, I}}, {{i, j}, {k, I}, {i, k}} and 
{{i, j}, {j, k}, {i, k}}. Again, lemma 5 prevents the second scenario {{i, j}, {k, I}, {i, k}} from occurring. 
The transformations of the first and the third cases are in table 5 and 6. Example transformations are shown 
in figure 5. □ 

Theorem 3. Any function h(x, zi,z 2 , ■■■Zk) in T 2 that is linear in z can be transformed to some function 
h'(x, Zf,Zb) in T 2 where Zf and Zb correspond to the forward and backward reference partitions respec- 
tively. 

Proof. Every Zi is independent of every other Zj due to the absence of bilinear terms ZiZj. Hence, the 
minimization under z can be carried out in any order. 



min/i(x, z s ) = 
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min/i(x, Zi,Zj) = minmin/i(x, Zi,Zj) = minminft,(x, z i7 Zj) (56) 

Zi ^Zj Zi Zj Zj Zi 

Applying lemma 3, followed by theorem 2, for every AV, the function h(x, z\ , z 2 , , . . . , Zk ) can be trans- 
formed into h(x, z\, z[, Zk, z' k ) where Zi and z' k correspond to the forward and backward reference parti- 
tions respectively. In other words, every Zi in the original function is replaced by Zi and z[. Note that one ref- 
erence partition may be sufficient in some cases. Finally we use lemma 1 1 to obtain h'(x\, x 2 , £3, x\, Zf,Zb) 
from h. □ 



5.2 Interacting avs 

The earlier theorem shows the transformation when the original function h has no bilinear terms ZiZj. The 
problem becomes more intricate in the presence of these terms. In the earlier case, we could define partitions 
using a single variable. Here, it is necessary to consider the partitions using two or more variables. Below, 
we show the joint partition that can solve the transformation with interactions between the AVs. We refer to 
this as the matroidal generators, since the associated partitions satisfy matroid constraints(See appendix). 

Definition 10. The matroidal generators associated with two AVs Zj\ and Zj 2 for expressing all graph- 
representable fourth order functions is given below: 

B g Bj! <^=> \B\ > 3, Aji = P\Bj! (57) 
B e Bj2 <^=> \B\>2, A j2 = V\B j2 (58) 



In Figure 3 we show the matroidal generators for fourth order functions. These partitions are same as the 
reference partitions studied earlier. The expressive power of these AVs are enhanced by interaction or the 
usage of the bilinear term ZjiZj 2 . 

Theorem 4. Any function h(x, Zi, z 2 , ...Zk) in T 2 that has bilinear terms ZiZj can be transformed to some 
function h'(x, Zji, Zj\) in F 2 . 

Proof. The basic idea of the proof is to decompose a given fourth order function using the result of [23] and 
show that all the spawned MBFs can be expressed by the matroidal generators. Using Theorem 5.2 from [23] 
we can decompose a given submodular function in J" 4 into 10 different groups Qi, i = {1..10} where each 
Q, L is in Table 1. 

Each group Qi contains three or four functions giving rise to a total of 30 or more different functions. 
Prior work uses one auxiliary variable for every function, whereas we will show that the two AVs corre- 
sponding to the matroidal generators are sufficient to simultaneously model all these functions. As shown 
in [31] the functions in Q w are not graph-representable. Note that the functions in Q w does not become 
graph-representable when combined with other generators of T A according to Theorem 16(3) in [31]. We 
also observe that these functions are not representable by both non-interacting and interacting AVs. Thus the 
largest subclass F 2 should be composed of functions in the remaining 9 groups. 

As the functions present in the groups Qi,i = {1..8} do not require bilinear AVterms, any sum of 
functions in Qi,i = {1..8} can be expressed with only two AVs zj and Zb according to Theorem 3. We 
consider the functions in Qg. The sum of functions in this group may lead to two alternatives. The union of 
functions in Q 9 may either result in a function in Q 9 or a function that uses the AVs zj and z\>. Any function 
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Group 


/(*) 


min 2l!22 /i(x, zi, z<i) where /i(x, zi, Z2) € T l 






X i Xj 


Go 


* 3 fc 


min (0 — t — v ■ — t 1 i 
±ii±ii2 y^ 1 •*->% •* j k } 


Qz 


— 3:12:2^3^4 


min z (3 — X\ — x 2 — x :i — x 4 ) 


Qa 


T» f, T* n T* j I if -, f f 1 <Y* i T* 11 if a 1 T* -, if r, if • 1 

JU 1 X2 JU3JL4 \ Jb\Jb'2-L.5 \ J- 1 J- 2 -f' 4 J, 1 J, 3 4 i 

^2^3^4 — £12:2 — £12:3 — x\x±— 

f .-, f r, f If . To T A 


min z (.z(l — xi — X2 — X3 — xa)) 




X^Xj X/^Xi X^X jXk X^X\ XjXi 


min z (z(2 — x t — Xj — x k — 2x t ) 


g 6 


X%X jX~fc X{Xj XiXfc XjXfc 


min z (z(l - Xi - x 3 - x k )) 




XiXjXfcXi X'iXjXfc X%XjX\ X%X]qX\ 


min z (,z(3 — 2xi — xj — Xk — xi)) 




2X1X2X3X4, — X1X2X3 — X1X2X4 — X1X3X4— 
X2X3X4 


min z (z(2 — x\ — X2 — X3 — 0:4)) 


g 9 


XiX jXkXl XiXj XiXfc XiXfcXl X jX k X\ 


min zliZ2 («i + 2z2 - z\Zi~ 
ZlXi - ZlXj — z 2 x k — Z2X1) 




— XiXjXkXi + XiXkXi + XjXkXi — 
XiXk — XiXl — XjX k — XjXi — XkXl 


/(x) 3 T2 as shown in [31] 



Table 1. The above table is adapted from Figure 2 of [32] where {i,j,k,l} = S4. Each group has several terms 
depending on the values of{i, j, k, I}. As the groups Q4 and Q% are symmetric with respect to {i, j, k, I}; they contain 
one function each. 



in G9 can be expressed using two AVs z 91 and z 92 [30]. As a result, the sum of functions in Q it i = {1..9} 
can be expressed using four AVs (zf, Zb, Z91, Z91). These four AVs could be merged into two AVs Zji and Zj 2 
in the matroidal generators as shown in Figure 3. 

Hence, all functions in Qi, i = {1..9} can be expressed by the matroidal generators. □ 

6 Linear Programming solution 

For a given function f(x\, X2, £3, X4) in Ff, our goal is to compute a function h(x, z) in T 2 . As a result of 
theorem 4 we only need to solve the case with two AVs (zji, Zj 2 ) associated with the matroidal generators. 
The required function h(x, z) is: 



/l(x, Zjl,Zj 2 ) = b + ^2 hxi - X! K X i X 3 

i>j 



(9jl - ^2 9jl,i%i)Zjl + {9j2 ~ ^2 9j2,iXi)z j2 ~ jl2ZjlZ j2 . 



(59) 

such that bij,gj\ t i,gj 2 ,i,ji2 > and i,j e S4. As we know the partition of (zji,Zj 2 ) we know their 
Boolean values for all labellings of x. We need the coefficients (b i} bij, j\ 2 , gji, gj 2 , 9ji,i, 5j2,i), * = ^4 to 
compute h(xi,x 2 , x 3} x±, Zji, Zj 2 ). These coefficients satisfy both submodularity constraints(that the coef- 
ficients of all bilinear terms (xiXj,XiZji,XjZj 2 , Zj\Zj 2 ) are less than or equal to zero) and those imposed by 
the reference partitions. First we list these conditions below: 

(K\ T 



9jl,i 
9j2,i 

\312J 



> 0,i,j = S 4 ,i^j 



(60) 



20 Srikumar Ramalingam, Chris Russell, Lubor Ladicky and Philip H.S. Torr 



where refers of a vector composed 0's of appropriate length. Next we list the conditions which guarantee 
/(x) = min Zjl!Zj2 h(x, Zji, Zj2) for all x. Let VS G V, and let the value of Zj\Zj 2 for different subsets She 
given by r](S). As we know the partition functions of both Zj\ and Zj 2 it is easy to find this. Let Q and W 
denote values of / and h for different S: 

= /(if, if, if, if) (61) 

4 4 

H = h(lf, if, if, if, 0, 0) - (g 3l - ]T g jhi lf ) - (9J2 ~ E 9o2Ai) - Ji2V{S) (62) 

i=l i=l 

As a result we have the following 16 linear equations (N.B. there are 2 4 (16) different S): 

g=H,VSeV (63) 

Note that as with section 5 we do not make use of either auxiliary variables or the min operator over H. 
Again, this because we already know the partition of (zji,Zj 2 ) and their appropriate values a priori. This 
can be seen as (63) need not hold if Zji and Zj 2 do not lie in the reference partitions. 



(91 Eti 9fAf\ > s e A D e A 
\9b-T,i=i9b,A, \) 



So 



(9f ^i9fAf\< 0tSeB DeB 
\9b ~ 2^i=i9bAi J 



Essentially we need to compute the coefficients {bij,gji,gji,u 9j2,9j2,i,ji2) that satisfy the equations (60,63,64) 
This is equivalent to finding a feasible point in a linear programming problem: 

min const (64) 

s.t S p > 0, g = H, G g > 0, Qi < (65) 

As discussed in section 4, by using a different cost function we can formulate a problem to to compute a 
function in T 2 closest to a given arbitrary fourth-order function. 



7 Discussion and open problems 

We observe that the basis MBFs corresponding to reference partitions always satisfy matroid constraints 
(See appendix). It can be easily shown that for k = 3 there is only one reference partition corresponding to 
a uniform matroid U\ . When k — 4we have two reference partitions corresponding to uniform matroids U\ 
and U 2 - Thus we conjecture that we can transform a large subclass, possibly the largest, of T 2 using k — 2 
matroidal generators. Each of these generators correspond to uniform matroids Ui,U 2 ,U 3 , ...,Uk- 2 - We do 
not have any proof for this result. However, our intuition is based on the following reasons: 

- The reference partitions for k = 3 and k = 4 are symmetrical with respect all Xi variables. 

- The reference partitions correspond to only distinct uniform matroids. 

- We can only transform a subclass of all submodular functions of order k. Using the result of Zivny et 
al., we know that when k > 4, not all submodular functions can be transformed to a quadratic PBF. 

- Although we use only a linear number of auxiliary variables, the underlying function is powerful as we 
employ all possible interactions among the auxiliary variables. Each of these intersection can be seen as 
the intersection of two uniform matroids. 
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A Tables 



i 


min(0, k(s, {i,j, k})) 


mm(0,K(s,{i,j,l})) 


min(0, k(s, {i, j, k,l})) 


«(/,{*, J}) 


1 














2 





K(s,{i,j,l}) 


k(s, S 4 ) 


g a ,k 


3 


«(«,{*, j,k}) 





{*,.?', M}) 


9s,i 


4 


K(s,{i,j,k}) 


K(s,{i,j,l}) 




«(«,{*, j}) 



Table 2. See lemma 4. /n all four cases K,(f,{i, j}) is non-negative. This result holds for the fourth case as 
<s,{i,j}) > 0. 



Case G B s . 


h 2 = k(s, {i,j})xiXj + ((2 * p s - 


- 9s, 


- fft.j) - - g s ,i) Xi - (g s - g s ,j) Xj- 

' ^ v ' V v ' 


p s ,fc x k — g St i xi)z t 


gt 


9t,j 3t,i 


ft, 


h 3t,l 








K(t,S) 


S G At or S G St 







SG A 


{i,fc} 


K(s,{j,k}) 


S G A since { j, k} G A 


{fc,i} 


K(s,{i,k})+K(8,{j,l}) 


S G A since {i, ft}, G A 


{«, j,k} 


-g s ,k 


SeB, 


{i,k,l} 


n(s,{j,k,l}) 


S G St since {j, ft, G Z3 S 



Table 3. See theorem 2. Ca.se 1: The details of the transformation (similar to one in equation (55)) are shown for a 
scenario where exactly one set ({i,j}) with cardinality two exist in B s . We prove that after the transformation all the 
sets S with \S\ = 2 exist in At and \S\ > 3 exist in B t . Although the reduction is illustrated for only a few cases, they 
are representative of the remainder. 



B Definitions 

Definition 11. A matroid M. is an ordered pair (E, I) consisting on a finite set E and a family of subsets I 
of E satisfying the following conditions: 

1. 0ex. 

2. If I e 1 and V C I, then V <= X. 

3. If Ii and I2 are inX and \Ii\ < Ify, then there is an element e of I2 — h such that I\ U e £ I. 

The maximal independent set in a matroid is called the base of a matroid. All the bases of a matroid are 
equicardinal,i.e., they have the same number of elements. 

Definition 12. The dual matroid of M. is given by A4* whose bases are the complements of the bases of M.. 

Definition 13. In a uniform matroid U n (E,T), all the independent sets ^ g 1 satisfy the condition that 
\I%\ < n f or some fixed n. 
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Case2:{i,j},{j, fc} G B a . 

h 2 = k(s, {i, j})xiXj + k(s, {j, k})xjX k + (3g 3 - 2g 3 j - g a>i - g 3 ± - 

s / 

3t 

(9s - 9s,j) Xi - (2g 3 - g s>i - g s>j - g s>k ) x 3 - (g s - g aJ) x k - (g Stt x t )z t 

9t,i 9t,j g t)k at, l 



s 


K(t, S) 


SeAtorSeBt 


{iJ} 





S e At 


{i,l} 


«(s, {i,k}) + n(s, {j,l}) 


S € At since {i, fc}, {j, 1} G A a 


{3,1} 


K(s,{j}) + g a ,t 


S G At since {j} G A s and g s> i > 


{i,j,k} 


~K(s,{j}) 


S eB t since {j} G A a 


{i,k,l} 


K{a,{i,k,l}) 


S sB t if {i,k,l} G B a 




~9s, I 


SeBt 



Table 4. See theorem I.Case 2: We study the scenario where exactly two sets with cardinality two {{i, j}, {j, fc}) occur 
in B s . Note that all other cases either can not happen (according to lemma 5) or similar to the ones shown in this table. 
We also prove that after the transformation all the sets S with \S\ = 2 exist in At and \ S\ > 3 exist in Bt- 



Case 3: {i,j}, {i, fc}, {i, 1} G B s . 


h 2 = k(s, {i, j})xiXj + k(s, {i, k})xiX k + k(s, {i, l})x l xi + 

(min(0, k(s, {j, fc, I}) + 3(g s - g Sii ) - min(0, n(s, {j, fc, I})) + 2(g s - g Syi ) x t - 


9t 

(g s - g s ,i) Xj - (g s - g B ,i) x k - (g s - 


9t,i 

9s,i) xi)z t 


3t,j 9t,k St,l 


s 


K(t,S) 


SeAtorSeBt 







SeAt 


{j,k} 


min(0, k(s, {j, fc, I})) + (g s - g s ,i) 


S G At since {i} G A s 


{hj,k} 


- K ( s ,{j}) 


S eBt since {j} G A s 


{i,k,l} 


n(s,{i,k,l}) 


SGB t if {i,k,l}eB s 


{i,3,l} 


~9s,i 


SeBt 



Table 5. See theorem 2. Case 3: Here we study the scenario where exactly three sets with cardinality two 
{{«, j}, {i, fc}, {i, I}) exist in B s . The only other case where three sets can exist is shown in table 6. The shown cases 
are generalizations of all the possible cases that can occur without violating lemma (5). We prove that after the trans- 
formation all the sets S with \ S\ = 2 exist in At and \ S\ > 3 exist in Bt- 
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Case 4: {i, j}, {i, k}, {i, 1} G B s . 

h 2 = k(s, -Xi- Xj - x k ) - (g Sik - g s j)xiX k - (g Stk - g s ,i)xjX k + 

(2(ff s - g s ,k) - (g a ~ g s ,k) Xi - (g 3 - g a , k ) Xj - (g s - g s>k ) x k - g s>l x t )z t + 

> v ' v v ' v v- ' v v ' 

9t 9t,i 3t,j 9t,k 9t,l 

{-2k(s, {i,j)) - (-k(s, {i,j)))(l - Xi) - (-«(«, {i,j)))(l - x 3 )- 

v V ' v v ' ~~ V ' 

9r 9 r ,i 9r,j 

(-k(s , {i, j))) (l - Xfc) - > _0_,(1 - ^i)) 2 ?- 



S 1 


n{t,S) 


S € A or S G B t 


{*,J} 





S€A t K = 


{t,0 


k(s,{A:,;}) 


S e At since {fc, /} G A 


{«, j,k} 


-«(«,{*}) 


S £ B t since {&} € A 






S £ B t since (? Sj ; > 


S 


«(r, S) 


5 G A or S G Z3 r 


{i,l} 





S€ A 


{hi} 


j}) 


S € A since {i, j} G S s 


W 







{i} 


«(«>{*, J'}) 


S € B r since {i, j} G Z3 S 



Table 6. See theorem 2.Cfli , e 4: We consider three sets {i, j}, {i, k}, {j, k} G B a which involve only three elements 
and all three repeating in more than one set. Without loss of generality, we assume that k(s, {i, j}) > k(s, {i, k}) and 
K ( s > {h j}) ^ K ( s i {h k})- In this case we replace the AV z s using two variables z t and z r . 



C Useful Lemmas 

It is not completely clear as to why a few basis AVs can replace several hundreds of AVs in a function in 
We observed some differences in the general partitions and reference partitions. We found that not all 
partitions satisfy the conditions of a matroid. However, the reference partitions form matroids in third and 
fourth order functions. We summarize these results in the following two lemmas. 

Lemma 6. The ordered pair {S4, A} corresponding to all partitions do not form a matroid. 

Proof. An ordered pair (E,I) is a matroid if it satisfies the three conditions given in Definition 11. The 
first condition is to show that e 1. As per the definition of the partition, {0, V} is a valid partition where 
A = $. The second matroid condition can be obtained by using lemma 9 in the reverse direction for subsets 
of A. The third matroid conditions states that if < |/ 2 | and h,l2 el then there exists an element e 
in I 2 such that I\ U e e I. However, this condition is not true for all partitions. For example, consider a 
partition {A = {{0}, {1}, {2}, {3}, {4}, {3, 4}}, B = V/A}}. Let h = {1} and I 2 = {3, 4} be the two 
independent sets. Although \Ii\ < \I 2 \, there is no element e in I 2 satisfying I\ U e e A. □ 

Lemma 7. The orderedpair {5 4 , Af} corresponding to the forward reference partition is a uniform matroid 
U 2 (See Definition 13). 

Proof. It can be easily seen that the subsets of Af satisfy the three conditions given in Definition 11. In 
addition, every A G A satisfies the condition \ A\ < 2. Thus {S4, Af} forms a uniform matroid U 2 . □ 

Lemma 8. The ordered pair {S4, Bb} corresponding to the forward reference partition is a uniform matroid 
Ui(See Definition 13). 
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Proof. It can be easily seen that the subsets of Bb satisfy the three conditions given in Definition 1 1 . In 
addition, every B £ Bb satisfies the condition \B\ < 2. Thus {S4, Bb} forms a uniform matroid U\. □ 

It can also be shown that the ordered pair {S4, Bf} is the dual of a uniform matroid {S4,Af} (See 
Definition 12). Similarly the ordered pair {S4, Ab} is the dual of a uniform matroid {S4, Bb}. 
The partitions are nothing but lattices and thus they satisfy the following property. 

Lemma 9. Every AV z s that is associated with a partition separates V into sets A s and B s such that if any 
set B <G B s , then every set S D B is also an element ofB s . 

Proof. If set B e B s then k(s, B) = (g s - £)? =1 9s A?) < °- since S 2 B and g a>i > 0, Mi = S4 we have 
(g s — Yli=i 9s, A f ) — (9 s — Eti ffs.ilf )• This implies that the partition coefficient n(s, S) < and thus 
SeB s . ' " ' ' □ 

Lemma 10. For an AV z s : [A s , B s ] if A s = we can transform a function /i(x, z s ) in T" 1 to some function 
/i'(x) in J-" 2 . Similarly for an AV z t : [At, Bt]> ifBt — %we can transform a function /i(x, z t ) in T 2 to some 
function h'(x) in F 2 . 

Proof. If A s = then argmin Zs h(x, z s ) = 1, Vx. Hence min Zs h(x, z s ) = h(x, 1) = h'(x). Similarly 
when B t = 0, min Zt h(x, z t ) — h(x, 0) = h'(x). 

Lemma 11. If z s and z t are two AYs in a function h(x, z s ,z t ) in F 2 sharing the same partition, then h can 
be transformed to some h'(x, z) in T 2 having a single AV with the same partition. 

Proof. The partition of z s is independent of z t and vice versa, since there is no z s z t term in h(x, z s ,z t ). 
Thus while studying the partition of z s we can treat z t as a constant. Since z s and z t have the same partition 
property, z s = z t at arg min Zs _ Zt h(x, z s ,z t ), Vx. Thus we can replace z s and z t using a single variable z in 
an equivalent function h'(x, z). 



